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We present a new and general Monte Carlo iteration method for generalized ensembles. It consists 
of two elements: (1) a simple algorithm to distinguish between distributions arising from respectively 
equilibrium- and non-equilibrium processes, and (2) a selfconsistent maximum-likelyhood estima- 
tion of the unknown thermodynamic quantities, based on the information obtained at all previous 
iterations. We demonstrate the efficiency of the method by calculating the density of state function 
of a homopolymer with 16 3 monomers. This represents an improvement of at least an order of 
" l". magnitude compared to previous studies. 
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i-^ ■ The method of Monte Carlo (MC) integration has proven succesfull for calculating thermodynamic properties of 
model systems with moderate number of degrees of freedom Q. The basic idea is to sample the phase space by 
generating a Markov chain of states through a fixed matrix of transition probabilities. These probabilities are chosen 
such that the condition of detailed balance is fulfilled for the statistical ensemble in concern. For instance, the 
traditional Metropolis algorithm Q samples directly in the canonical ensemble (CE) by the choise of Boltzmann 
, transition probabilites. 

A generic deficiency of the Metropolis sampling technique is the slow relaxation of the Markov chain which typically 
appears at points of phase transitions or at low temperatures. For systems with a rugged energy landscape, such as 
spin glasses j| , random heteropolymers 0] or even spin-interacting homopolymers || , this problem effectively causes 
the chain to be trapped in the phase space around local energy minima. In any case, slow relaxation easily leads to 
results which are errorously sensitive to the initial states of the Markov chain. 

In the past decade, a variety of MC-methods, based on non-Boltzmann probability distributions, have been devel- 
^ ■ oped to improve the phase space sampling, They are commonly referred to as broad energy ensemble or generalized 
ensemble (GE) methods corresponding to the generalization of the partition function, Z: 
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Here, <fi is some coarse-grained state variable, T is the density of states and u)((j>) are the ensemble-defining weights. 
Different choices have been proposed in the litterature. In the multicanonical approach (MU) [Q, (j> is simply the energy, 
E, of the system and lumu{E) = T(E)^ 1 . This approach also includes the entropic sampling [^J. In the so-called 
^ l//c-ensemble [Q the weights are defined as u 1 / k (E) = k(E)^ 1 , with k(E) = J2e'<e^(E')- Finally, for Simulated 
Tempering (ST) |lO|[Tl| ] tf> also contains a temperature index I, i.e. cj> = {E, I}, and wst{E, I) = cxp(— fyE + fi), where 
fi is the reduced free energy at the inverse temperature f3i, 

e-fi =ZcB(fii) = Yr(E)»- fhE 
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Here, Zce{P) is the usual canonical partition function. By inspection of eq. ([!]) it is clear that MU is equivalent to 
uniform energy sampling and ST is equivalent to a uniform temperature sampling in the preassigned set of values 
The last method practically corresponds to a uniform entropy sampling JlJ]. 

Unlike the canonical ensemble, the probability weights are not a priori known in the generalized ensembles, since u 
in all cases is a function, g, of the density of states (or the free energy), uj — g(T), and the knowledge of T corresponds 
exactly to solving the thermodynamics. The standard solution to this problem is to determine lu through an iterative 
procedure, uJi+\ = F\(Ni, u>i), where AT, = Ni(<p) is the observed histogram over <f> obtained in the z'th simulation with 
the weights Wj. The map, F\, is contructed to have the desired fixed point, F\ (N, g(T)) = g(T), where N — N(<f>) is 
the expected number of counts in a simulation with weights g(T). Typically, the iteration process is stopped when 
the observed histogram and the weights are sufficiently close to the fixed point requirement. A large run is then 
performed in which the physical quantities of interest are studied. 

Despite several succesfull applications of this iteration scheme (see eg. p|,p|-p^[), it suffers from the loss of statistics 
inhereted in the updating rule for the weights. Since Wj+i only depends on the last histogram, Ni, a nescessary 
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requirement to insure convergence is that the set S(Ni) = {<fi\Ni((j)) > 0} encloses at least the same region of the 
(coarse-grained) phase space as the previously obtained set <S(A^_i), i.e. S(Ni) D S(iVj_i). Otherwise, the iteration 
Fi(Ni,LOi) will lead to a worse estimate of g(T) for such regions of the phase space. This is clearly a strong statistical 
requirement. In particular, it prevents for a more adiabatic updating of the weights, which otherwise could speed up 
the convergence. 

In this letter we propose a new iteration scheme which accounts for the information obtained from in principle all 
previous iterations. More precisely, a selfconsistent map uJi+i = Fm {{N^_^ , ^(i-j)}*!^ 1 ) > wm be derived where the 
'memory' M can be chosen arbitrary large. As such there will not be any specific statistical requirement to insure 
the convergence, and it therefore allows for an extension of the GE-analysis to systems of much greater complexity 
or size. 

The scheme is based on a maximum- likelyhood estimate of the density of state consistent with the set of distributions 
{Ni}i. In order not to overestimate metastable states, it is important, however, to distinguish between statistics 
arising from respectively equilibrium and non-equilibrium processes. Therefore we will first discuss a simple but 
general algorithm for doing so. Its basic principle is to assign a number 9 — 9{(fi) and a set 0(</>) to each (fi-bm during 
the simulation. This pair, (0,0), is updated on the same MC-rate as the distribution Ni, say after each timestep 
At. If (fit denotes the state of the system at time t and <fi' t is the test state for the last move (in case of acceptance, 
<fi' t — (fit) the quantities (9, 0) are given by the following rules: 

(1) Initially, (9, &)(cfi) d = (0, {}) for all states <j> except for the first one <fi , for which (9, Q)((fio) d = (1, {1}). 

(2) If the transition <fi t — > (fit+At takes the Markov chain into a new state, 9((fit+At) = 0, one puts 
(9, e)(& +At ) = f (0(<f>t) + 1, &(<t>t + At) U {9(<fit) + 1}). 

(3) If the transition <fi t — > (fit+At takes the Markov chain into an old state, 9((fit+At) > 0, one redefines (9, 0) for all 
states for which 0(0) > 9{<fi t+At ) as 9{(fi) d = f 9{(fi t+ At) and 6(0) d = f {9{(fi t +At)} U {m G 6(0) | m < 0(0t +A t)}- 

(4) After the value 9(<fit+At) has been determined by (2) or (3) ®{4>'t+At) 1S seperately updated as 

6(#+At) d = ®(ti+At) U {0(<fit + At)}. 

It follows, that the 0-value of the present state (fit of the simulation always equals the maximum value. The purpose 
of the ^-function is to partition the phase space into the kinetically connected or locally equilibrated regions. Each set 
&m — {0|$(0) — m } defines such a region, whereas the transition <f> m — > $ m / with m! > m, corresponds to a process 
which, within the total simulation time Ti spent at iteration i, effectively is out of equilibrium. The time-step At for 
this analysis should be large enough to reflect the local properties of the free energy landscape, f{(fi) ~ log(r(^)w(^)), 
i.e. At should be comparable to the (local) decorrelation time. Dependent on the choisc of At, some of the regions 
generated by the values of 9 might contain only very few counts. This will for instance be characteristical for the 
transition region between two minima of the free energy landscape. After the completion of the z'th simulation, a 
simple minimums criteria on the statistical content can be applied |l3| to discard such values of 9. Hereafter, 9 can 
be considered as a way of labelling the different effective 'macro-states' or basins observed in the simulation. 

The 'adjoint' quantity, 9, keeps track of the neighbourhood of each basin as well, <& m = {fi \ m e 6(^)} 2 $ m , by 
the inclusion of the observed but possibly rejected states (rule 4). The nescessity for this rather tedious book-keeping 
is that the rejection of a state may indicate that the corresponding free energy previously has been underestimated. 
The algorithm is schematically illustrated in fig. 1. 

Following the line of arguments, the distribution Ni(<fi) resulting from iteration i must be partitioned, Ni — > {Ni m } m , 
according to the different equilibrium regions {& m }m- If the simulation is done in the ST-ensemble, a natural further 
partioning, m — ► ml = ml, is defined by the different temperatures, 7); such that {^>i m '} m ' and {Ni m i} m r can be 
considered as sets and functions in energy space alone. For all types of ensembles, let Mi denote the total number 
of partitions at iteration i and let mj be the corresponding index variable. Furthermore, put s = {i,mi} such that a 
particular set or histogram obtained at a given iteration can be referred to simply as <5> s or N s respectively. 

The map, Fm, is readily constructed in the following way. The probability p s (E \T) of observing the energy E 
within the domain <!> s , given the density of states T and the weights uj s (E), is defined by 

Ps(E\T) = W^iEl, Zs V J2 u s (E)T(E) . 
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Under the assumption of (restricted) statistical independence, the histogram N s (E) will be a member of the multi- 
nomial probability distribution P s ; 



p s {N s \v) = n s \ n 



N S (E)\ 



where n s = J^e N a (E) is the total number of counts in $ s . The likelyhood C for observing the set of histograms 
{N s }fU (M = J2 t M i), is given by the product of the P s 's , i.e. C{{N s }f =l \T) = JI^i P s{N s \T). An efficient §J] 
estimate, T, of the true density of states can now be obtained by maximizing C with respect to T . This leads to the 
expression, 

f(E) = Ss=i N S {E) 

where S(E G $) = 1 if E G $ and zero otherwise. The partition functions Z s must be estimated selfconsistently from 
eq. (||). This set of estimates {Z s } can be expressed as the solution to the following M equations, y s =i,...,M', 

y d M V = i (3) 

In fact, these equations contain the multihistogram equations Jr^ | as a special case. This comparison together with 
the different numerical solving procedures mentioned below, will be discussed in details in a seperate paper p3| . 

By inspection of eq. (||) it is clear that each disconnected region of $ tot = U s =i will introduce a zero mode 
in the Jacobian corresponding to a (local) normalization constant in terms of an entropy or free energy, which 
can be chosen freely. In particular, an overall normalization constant has to be fixed in order to define a unique 
solution. A way around the problem of multiple zero modes, is to solve eq. (|^) seperately for each connected region 
and estimate the relations between the normalization constants afterwards by interpolating the slope of the entropy 
function S(E) = log(r(P)). Obviously, a reliable estimate of T is not obtained before $t Q t constitutes one overall 
connected set. 

The weights for simulation i + 1 is obtained by inserting the solution {Zg}^^ (M — J2l>=i Mi>) °f e 1- (H) m t° 
expression (|J) and use uJi+i = g{T). Contrary to the standard GE-iteration method, the Markov chain does not 
have to be reinitialized for the next iteration, not even if the Markov chain at iteration i ends in a trapped region of 
the phase space. Since the iteration scheme for u> keeps all the information previously obtained and since the new 
weights Ui+i by construction improves the phase space sampling, one can simply continue the next iteration from 
the final state <j) Ti of the last one. This makes the scheme very efficient in optimization problems. For the sake of 
completeness it should also be noted, that both the 'equilibrium sorting' algorithm and eq. can be directly 

applied to multidimensional state variables <f> as well. 

As a demonstration of the method we calculate the density of states of a self-avoiding homopolymer with a nearest- 
neighbour (non-bonded) potential of energy e = — 1 in a simple cubic lattice |is[| . Homopolymer models of this type 
have recently been the subject of extensive studies, both on- lattice flsH^] and off-lattice |l2,^3,|3. Order parameter 
related quantities have been calculated in |22] up to rather long on-lattice chains [L = 5000) , but estimates of heat- 
capacities or densities of states have been limited to much shorter chains (in all cases L < 400), due to the normal 
difficulty associated with obtaining converged results for these quantities f2q| . 

In this study, we have chosen L = 16 3 and performed three independent runs in the multicanonical ensemble, 
g(T) — r _1 , with the parameters r tot = J2i T i — 4 ' 10 9 an( i At = 4 • 10 3 . The energies are binned with AE = 8 
which implies that ~ 8 • 10 2 independent values of T has to be estimated. For each run the system is initialized by 
a random self-avoiding walk and with the infinite temperature weights w,-i(i?) = 1. The simulation time Tj in each 
iteration within a run is defined dynamically, roughly proportional to the size of the phase space observed Jl3| . The 
maximal 'memory' of the iteration scheme is set to M max = 100, which has turned out to be sufficient to contain the 
full history of each run. 

The results are shown in fig. 2. In the large figure the change of entropy AS compared to the random coil is 
plotted as function of the total energy E, for the three runs. Deviations can be observed for energies E < —5500, but 
otherwise the differences are less than the size of the data points. In the inset, these differences are depicted in terms 
of AAS* = AS— (AS)av: where (AS) av is the weighted average of the three runs. For energies E > —5500 the relative 
uncertainties are generally observed to be of the order ~ 10~ 3 . No thermodynamical transitions takes place below 
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E = —5500, and the results have converged in the sense that they are sufficiently accurate to reproduce the sensitive 
thermodynamic quantities such as the temperature dependence of the heat capacity. We refer to a forthcoming 
publication pfjj for this analysis. Keeping in mind the level of convergence and relating the size of the system to the 
total simulation time, the iteration method constitutes an improvement of at least one order of magnitude compared 
to previous studies. 

In fig. 3, a typical state of the homopolymer in the energy region E < —5500 is depicted. The state is completely 
compact, and its difference to the ground state (the box 16 3 ) is only due to the particular arrangement of the 'rigid' 
surface. The structure of the state testifies to the fact that the method is capable of investigating most of the coarse 
grained phase space, even within the relatively short MC-time r tot available. 

In summary, we have presented a new and general Monte Carlo iteration scheme for generalized ensembles (GE), 
which allows for an extension of the GE-analysis to systems of much greater size or complexity. In the future, we plan 
to test the method on off-lattice polymers and spin-glasses. 

It is a pleasure for me to acknowledge the helpful discussions with M.H. Jensen, K. Sneppen, P.-A. Lindgaard, G. 
Tiana and A.D. Jackson. 
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FIG. 1. Schematic illustration of the equilibrium sorting algorithm, (a) shows the free energy landscape f((j>) (in arbitrary 
units) with two minima as function of the state variable <j) (arbitrary numbers) for some system, which is initialized in state 4>q. 
(b) shows a possible time evolution of the system in a MC-simulation. The full line is the actual history, whereas the marks 
(+) show the test states of the Markov chain. Two non-equilibrium transitions are observed, <j>o — ► $1 and <E>i — > $2- Here, $1 
and $2 are the basins defined by the statistical relevant values of 9 (see text). The figure also shows the extended regions $i 
and $2 obtained by the inclusion of the test states within each basin. 



FIG. 2. The change of entropy AS compared to the random coil for a homopolymer with 16 3 monomers, as function of 
the total energy E. Three independent simulations have been performed with ~ 4 • 10 9 MC-steps. In the inset the differences 
between the results are shown in therms of AAS = AS — (AS) „, where (AS) Q „ is the weighted average of the three runs. 
The relative uncertainty of AS is generally observed to be of the order ~ 10 for energies E > —5500. 



FIG. 3. An example of a typical homopolymer state with L = 16 3 monomers in the low energy region (E < —5500). Only 
the particular arrangement of the 'rigid' surface makes it different from the true ground state. The structure of the state 
testifies to the fact, that the simulation is capable of investigating most of the relevant phase space within a relatively short 
MC-time ~ 4 ■ 10 9 . 
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